A Bayesian framework for de novo mutation calling in parents-offspring trios
نویسندگان
چکیده
MOTIVATION Spontaneous (de novo) mutations play an important role in the disease etiology of a range of complex diseases. Identifying de novo mutations (DNMs) in sporadic cases provides an effective strategy to find genes or genomic regions implicated in the genetics of disease. High-throughput next-generation sequencing enables genome- or exome-wide detection of DNMs by sequencing parents-proband trios. It is challenging to sift true mutations through massive amount of noise due to sequencing error and alignment artifacts. One of the critical limitations of existing methods is that for all genomic regions the same pre-specified mutation rate is assumed, which has a significant impact on the DNM calling accuracy. RESULTS In this study, we developed and implemented a novel Bayesian framework for DNM calling in trios (TrioDeNovo), which overcomes these limitations by disentangling prior mutation rates from evaluation of the likelihood of the data so that flexible priors can be adjusted post-hoc at different genomic sites. Through extensively simulations and application to real data we showed that this new method has improved sensitivity and specificity over existing methods, and provides a flexible framework to further improve the efficiency by incorporating proper priors. The accuracy is further improved using effective filtering based on sequence alignment characteristics. AVAILABILITY AND IMPLEMENTATION The C++ source code implementing TrioDeNovo is freely available at https://medschool.vanderbilt.edu/cgg. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Joint Variant and De Novo Mutation Identification on Pedigrees from High-Throughput Sequencing Data
The analysis of whole-genome or exome sequencing data from trios and pedigrees has been successfully applied to the identification of disease-causing mutations. However, most methods used to identify and genotype genetic variants from next-generation sequencing data ignore the relationships between samples, resulting in significant Mendelian errors, false positives and negatives. Here we presen...
متن کاملImputation of parent-offspring trios and their effect on accuracy of genomic prediction using Bayesian method
The objective of this study was to evaluate the imputation accuracy of parent-offspring trios under different scenarios. By using simulated datasets, the performance Bayesian LASSO in genomic prediction was also examined. The genome consisted of 5 chromosomes and each chromosome was set as 1 Morgan length. The number of SNPs per chromosome was 10000. One hundred QTLs were randomly distributed a...
متن کاملNew observations on maternal age effect on germline de novo mutations
Germline mutations are the source of evolution and contribute substantially to many health-related processes. Here we use whole-genome deep sequencing data from 693 parents-offspring trios to examine the de novo point mutations (DNMs) in the offspring. Our estimate for the mutation rate per base pair per generation is 1.05 × 10(-8), well within the range of previous studies. We show that matern...
متن کاملInference of Candidate Germline Mutator Loci in Humans from Genome-Wide Haplotype Data
The rate of germline mutation varies widely between species but little is known about the extent of variation in the germline mutation rate between individuals of the same species. Here we demonstrate that an allele that increases the rate of germline mutation can result in a distinctive signature in the genomic region linked to the affected locus, characterized by a number of haplotypes with a...
متن کاملA Likelihood-Based Framework for Variant Calling and De Novo Mutation Detection in Families
Family samples, which can be enriched for rare causal variants by focusing on families with multiple extreme individuals and which facilitate detection of de novo mutation events, provide an attractive resource for next-generation sequencing studies. Here, we describe, implement, and evaluate a likelihood-based framework for analysis of next generation sequence data in family samples. Our frame...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 31 9 شماره
صفحات -
تاریخ انتشار 2015